Some Notes on Matrices and Linear Operators The Matrix As a Linear Operator Let A A A be an m × n m\times n m × n matrix.
The function
T A : R n → R m , T A ( x ‾ ) = A x ‾ , T_A:\mathbb{R}^n\to\mathbb{R}^m, T_A(\underline{x}) = A\underline{x}, T A : R n → R m , T A ( x ) = A x ,
is linear, that is
T A ( a x ‾ + b y ‾ ) = a T A ( x ‾ ) + b T A ( y ‾ ) T_A (a\underline{x} + b\underline{y}) = aT_A(\underline{x}) + bT_A(\underline{y}) T A ( a x + b y ) = a T A ( x ) + b T A ( y )
if x ‾ , y ‾ ∈ R n \underline{x}, \underline{y} \in \mathbb{R}^n x , y ∈ R n and a , b ∈ R a, b \in \mathbb{R} a , b ∈ R .
Examples If
A = [ 1 2 ] A = \begin{bmatrix} 1 & 2 \end{bmatrix} A = [ 1 2 ] then T A ( x ‾ ) = x + 2 y T_A(\underline{x}) = x + 2y T A ( x ) = x + 2 y where x ‾ = ( x y ) ∈ R 2 \underline{x} = \displaystyle{x \choose y}\in \mathbb{R}^2 x = ( y x ) ∈ R 2
If
A = [ 0 1 1 0 ] A = \begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix} A = [ 0 1 1 0 ] then
T A ( x y ) = [ y x ] T_A\displaystyle{x \choose y} = \begin{bmatrix} y \\ x \end{bmatrix} T A ( y x ) = [ y x ] If
A = [ 0 2 3 1 0 1 ] A = \begin{bmatrix} 0 & 2 & 3 \\ 1 & 0 & 1 \end{bmatrix} A = [ 0 1 2 0 3 1 ] then
T A ( x y z ) = [ 2 y + 3 z x + z ] T_A \left( \begin{array}{ccc} x \\ y \\ z \end{array} \right) = \begin{bmatrix} 2y + 3z \\ x + z \end{bmatrix} T A ⎝ ⎛ x y z ⎠ ⎞ = [ 2 y + 3 z x + z ] If
T ( x y ) = ( x + y 2 x − 3 y ) T \displaystyle{x \choose y } = \left( \begin{array}{cc} x + y \\ 2x - 3y \end{array} \right) T ( y x ) = ( x + y 2 x − 3 y ) then T ( x ‾ ) = A x ‾ T (\underline{x}) = A \underline{x} T ( x ) = A x if we set
A = [ 1 1 2 − 3 ] A = \begin{bmatrix} 1 & 1 \\ 2 & -3 \end{bmatrix} A = [ 1 2 1 − 3 ] Inner Products and Norms Assuming x x x and y y y are vectors, then we define their inner product by
x ⋅ y = x 1 y 1 + x 2 y 2 + ⋯ + x n y n x \cdot y = x_1y_1 + x_2y_2 + \cdots + x_ny_n x ⋅ y = x 1 y 1 + x 2 y 2 + ⋯ + x n y n
where
x = ( x 1 ⋮ x n ) x = \begin{pmatrix} x_1 \\ \vdots \\ x_n \end{pmatrix} x = ⎝ ⎛ x 1 ⋮ x n ⎠ ⎞ and
y = ( y 1 ⋮ y n ) y = \begin{pmatrix} y_1 \\ \vdots \\ y_n \end{pmatrix} y = ⎝ ⎛ y 1 ⋮ y n ⎠ ⎞ Details If x x x , y y y ∈ R n \in \mathbb{R}^n ∈ R n are arbitrary (column) vectors, then we define their inner product by
x ⋅ y = x 1 y 1 + x 2 y 2 + ⋯ + x n y n x \cdot y = x_1y_1 + x_2y_2 + \cdots + x_ny_n x ⋅ y = x 1 y 1 + x 2 y 2 + ⋯ + x n y n
where
x = ( x 1 ⋮ x n ) x = \begin{pmatrix} x_1 \\ \vdots \\ x_n \end{pmatrix} x = ⎝ ⎛ x 1 ⋮ x n ⎠ ⎞ and
y = ( y 1 ⋮ y n ) y = \begin{pmatrix} y_1 \\ \vdots \\ y_n \end{pmatrix} y = ⎝ ⎛ y 1 ⋮ y n ⎠ ⎞ Note that we can also view x x x and y y y as n × 1 n \times 1 n × 1 matrices and we see that x ⋅ y = x ′ y x \cdot y = x^\prime y x ⋅ y = x ′ y .
The normal length of a vector is defined by ∥ x ∥ 2 = x ⋅ x \left \| x \right \|^2 = x \cdot x ∥ x ∥ 2 = x ⋅ x .
It may also be expressed as ∥ x ∥ = x 1 2 + x 2 2 + ⋯ + x n 2 \left \| x \right \| = \sqrt{x_1^2 + x_2^2 + \cdots + x_n^2} ∥ x ∥ = x 1 2 + x 2 2 + ⋯ + x n 2 .
It is easy to see that for vectors a , b a, b a , b and c c c we have ( a + b ) ⋅ c = a ⋅ c + b ⋅ c (a+b)\cdot c=a\cdot c+ b\cdot c ( a + b ) ⋅ c = a ⋅ c + b ⋅ c and a ⋅ b = b ⋅ a a\cdot b=b\cdot a a ⋅ b = b ⋅ a .
Examples Two vectors x x x and y y y are said to be orthogonal if x ⋅ y = 0 x \cdot y = 0 x ⋅ y = 0
If
x = ( 3 4 ) x = \begin{pmatrix} 3 \\ 4 \end{pmatrix} x = ( 3 4 ) and
y = ( 2 1 ) y = \begin{pmatrix} 2 \\ 1 \end{pmatrix} y = ( 2 1 ) then
x ⋅ y = 3 ⋅ 2 + 4 ⋅ 1 = 10 x \cdot y = 3 \cdot 2 + 4 \cdot 1 = 10 x ⋅ y = 3 ⋅ 2 + 4 ⋅ 1 = 10
and
∥ x ∥ 2 = 3 2 + 4 2 = 25 \left \| x \right \|^2 = 3^2 + 4^2 = 25 ∥ x ∥ 2 = 3 2 + 4 2 = 25
so
∥ x ∥ = 5 \left\| x \right \| = 5 ∥ x ∥ = 5
Orthogonal Vectors Two vectors x x x and y y y are said to be orthogonal if x ⋅ y = 0 x\cdot y=0 x ⋅ y = 0 denoted x ⊥ y x \perp y x ⊥ y .
Details Two vectors x x x and y y y are said to be orthogonal if x ⋅ y = 0 x\cdot y=0 x ⋅ y = 0 denoted x ⊥ y x \perp y x ⊥ y .
If a , b ∈ R n a,b \in \mathbb{R}^n a , b ∈ R n then
∥ a + b ∥ 2 = a ⋅ a + 2 a ⋅ b + b ⋅ b \left\|a+b\right\|^2=a\cdot a+2a\cdot b+b\cdot b ∥ a + b ∥ 2 = a ⋅ a + 2 a ⋅ b + b ⋅ b
so
∥ a + b ∥ 2 = ∥ a ∥ 2 + ∥ b ∥ 2 + 2 a ‾ b ‾ \left\|a+b\right\|^2=\left\|a\right\|^2+\left\|b\right\|^2 + 2\underline{a}\underline{b} ∥ a + b ∥ 2 = ∥ a ∥ 2 + ∥ b ∥ 2 + 2 a b
Note that if a ⊥ b a \perp b a ⊥ b then ∥ a + b ∥ 2 = ∥ a ∥ 2 + ∥ b ∥ 2 \left\|a+b\right\|^2=\left\|a\right\|^2+ \left\|b\right\|^2 ∥ a + b ∥ 2 = ∥ a ∥ 2 + ∥ b ∥ 2 , which is Pythagoras' theorem in n n n dimensions.
Linear Combinations of Independent Identicallly Distributed Variables Suppose X 1 , … , X n X_1,\dots,X_n X 1 , … , X n are independent identically distributed random variables and have mean μ 1 , … , μ n \mu_1, \dots, \mu_n μ 1 , … , μ n and variance σ 2 \sigma^2 σ 2 then the expected value of Y Y Y of the linear combination is:
Y = ∑ a i X i Y=\displaystyle\sum a_i X_i Y = ∑ a i X i
and if a 1 , … , a n a_1,\dots,a_n a 1 , … , a n are real constants then the mean is:
μ Y = ∑ a i μ i \mu_Y = \displaystyle\sum a_i \mu_i μ Y = ∑ a i μ i
and the variance is:
σ 2 = ∑ a i 2 σ i 2 \sigma^2 = \displaystyle\sum a^2_i \sigma^2_i σ 2 = ∑ a i 2 σ i 2
Examples Consider two independent identically distributed random variables, Y 1 , Y 2 Y_1,Y_2 Y 1 , Y 2 such that E [ Y 1 ] = E [ Y 2 ] = 2 E[Y_1]=E[Y_2]=2 E [ Y 1 ] = E [ Y 2 ] = 2 and V a r [ Y 1 ] = V a r [ Y 2 ] = 4 Var[Y_1]=Var[Y_2]=4 Va r [ Y 1 ] = Va r [ Y 2 ] = 4 , and a specific linear combination of the two, W = Y 1 + 3 Y 2 W=Y_1+3Y_2 W = Y 1 + 3 Y 2 .
We first obtain
E [ W ] = E [ Y 1 + 3 Y 2 ] = E [ Y 1 ] + 3 E [ Y 2 ] = 2 + 3 ⋅ 2 = 2 + 6 = 8 E[W]=E[Y_1+3Y_2]=E[Y_1]+3E[Y_2]=2+3\cdot 2=2+6=8 E [ W ] = E [ Y 1 + 3 Y 2 ] = E [ Y 1 ] + 3 E [ Y 2 ] = 2 + 3 ⋅ 2 = 2 + 6 = 8
Similarly, we can first use independence to obtain
V a r [ W ] = V a r [ Y 1 + 3 Y 2 ] = V a r [ Y 1 ] + V a r [ 3 Y 2 ] Var[W]=Var[Y_1+3Y_2]=Var[Y_1]+Var[3Y_2] Va r [ W ] = Va r [ Y 1 + 3 Y 2 ] = Va r [ Y 1 ] + Va r [ 3 Y 2 ]
and then (recall that V a r [ a Y ] = a 2 V a r [ Y ] Var[aY]=a^2Var[Y] Va r [ aY ] = a 2 Va r [ Y ] )
V a r [ Y 1 ] + V a r [ 3 Y 2 ] = V a r [ Y 1 ] + 3 2 V a r [ Y 2 ] = 1 2 ⋅ 4 + 3 2 ⋅ 4 = 1 ⋅ 4 + 9 ⋅ 4 = 40 Var[Y_1]+Var[3Y_2]=Var[Y_1]+3^2Var[Y_2]=1^2 \cdot 4+3^2\cdot 4= 1 \cdot 4 + 9 \cdot 4= 40 Va r [ Y 1 ] + Va r [ 3 Y 2 ] = Va r [ Y 1 ] + 3 2 Va r [ Y 2 ] = 1 2 ⋅ 4 + 3 2 ⋅ 4 = 1 ⋅ 4 + 9 ⋅ 4 = 40
Normally, we just write this up in a simple sequence
V a r [ W ] = V a r [ Y 1 + 3 Y 2 ] = V a r [ Y 1 ] + 3 2 V a r [ Y 2 ] = 1 2 ⋅ 4 + 3 2 ⋅ 4 = 1 ⋅ 4 + 9 ⋅ 4 = 40 Var[W]=Var[Y_1+3Y_2]=Var[Y_1]+3^2Var[Y_2]=1^2 \cdot 4+3^2\cdot 4 = 1 \cdot 4 + 9 \cdot 4= 40 Va r [ W ] = Va r [ Y 1 + 3 Y 2 ] = Va r [ Y 1 ] + 3 2 Va r [ Y 2 ] = 1 2 ⋅ 4 + 3 2 ⋅ 4 = 1 ⋅ 4 + 9 ⋅ 4 = 40
Covariance Between Linear Combinations of Independent Identically Distributed Random Variables Suppose Y 1 , … , Y n Y_1,\ldots,Y_n Y 1 , … , Y n are independent identically distributed, each with mean μ \mu μ and variance σ 2 \sigma^2 σ 2 and a , b ∈ R n a,b\in \mathbb{R}^n a , b ∈ R n .
Writing
Y = ( Y 1 ⋮ Y n ) Y = \left( \begin{array}{ccc} Y_1 \\ \vdots \\ Y_n \end{array} \right) Y = ⎝ ⎛ Y 1 ⋮ Y n ⎠ ⎞ consider the linear combination a ′ Y a'Y a ′ Y and b ′ Y b'Y b ′ Y .
Details The covariance between random variables U U U and W W W is defined by
C o v ( U , W ) = E [ ( U − μ u ) ( W − μ w ) ] Cov(U,W)= E[(U-\mu_u)(W-\mu_w)] C o v ( U , W ) = E [( U − μ u ) ( W − μ w )]
where μ u = E [ U ] \mu_u=E[U] μ u = E [ U ] and μ w = E [ W ] \mu_w=E[W] μ w = E [ W ] .
Now, let U = a ′ Y = ∑ Y i a i U=a'Y=\displaystyle\sum Y_ia_i U = a ′ Y = ∑ Y i a i and W = b ′ Y = ∑ Y i b i W=b'Y=\displaystyle\sum Y_ib_i W = b ′ Y = ∑ Y i b i , where Y 1 , … , Y n Y_1,\ldots,Y_n Y 1 , … , Y n are independent identically distributed with mean μ \mu μ and variance σ 2 \sigma^2 σ 2 , then we get
C o v ( U , W ) = E [ ( a ′ Y − Σ a μ ) ( b ′ Y − Σ b μ ) ] Cov(U,W)= E[(a'Y-\Sigma a_\mu)(b'Y-\Sigma b\mu)] C o v ( U , W ) = E [( a ′ Y − Σ a μ ) ( b ′ Y − Σ b μ )]
= E [ ( Σ a i Y i − Σ a i μ ) ( Σ b j Y j − Σ b j μ ) ] = E[(\Sigma a_iY_i -\Sigma a_i\mu)(\Sigma b_jY_j -\Sigma b_j\mu )] = E [( Σ a i Y i − Σ a i μ ) ( Σ b j Y j − Σ b j μ )]
and after some tedious (but basic) calculations we obtain
C o v ( U , W ) = σ 2 a ⋅ b Cov(U,W)=\sigma^2a\cdot b C o v ( U , W ) = σ 2 a ⋅ b
Examples If Y 1 Y_1 Y 1 and Y 2 Y_2 Y 2 are independent identically distributed, then
C o v ( Y 1 + Y 2 , Y 1 − Y 2 ) = C o v ( ( 1 , 1 ) ( Y 1 Y 2 ) , ( 1 , − 1 ) ( Y 1 Y 2 ) ) Cov(Y_1+Y_2, Y_1-Y_2) = Cov \left( (1,1) \begin{pmatrix} Y_1 \\ Y_2 \end{pmatrix}, (1,-1) \begin{pmatrix} Y_1 \\ Y_2 \end{pmatrix} \right) C o v ( Y 1 + Y 2 , Y 1 − Y 2 ) = C o v ( ( 1 , 1 ) ( Y 1 Y 2 ) , ( 1 , − 1 ) ( Y 1 Y 2 ) ) = ( 1 , 1 ) ( 1 − 1 ) σ 2 = 0 = (1,1) \begin{pmatrix} 1 \\ -1 \end{pmatrix} \sigma^2 = 0 = ( 1 , 1 ) ( 1 − 1 ) σ 2 = 0 and in general, C o v ( a ‾ ′ Y ‾ , b ‾ ′ Y ‾ ) = 0 Cov(\underline{a}'\underline{Y}, \underline{b}'\underline{Y})=0 C o v ( a ′ Y , b ′ Y ) = 0 if a ‾ ⊥ b ‾ \underline{a}\bot \underline{b} a ⊥ b and Y 1 , … , Y n Y_1,\ldots,Y_n Y 1 , … , Y n are independent.
Random Vectors Y = ( Y 1 , … , Y n ) Y= (Y_1, \ldots, Y_n) Y = ( Y 1 , … , Y n ) is a random vector if Y 1 , … , Y n Y_1, \ldots, Y_n Y 1 , … , Y n are random variables.
Details If E [ Y i ] = μ i E[Y_i] = \mu_i E [ Y i ] = μ i then we typically write
E [ Y ] = ( μ 1 ⋮ μ n ) = μ E[Y] = \left( \begin{array}{ccc} \mu_1 \\ \vdots \\ \mu_n \end{array} \right) = \mu E [ Y ] = ⎝ ⎛ μ 1 ⋮ μ n ⎠ ⎞ = μ If C o v ( Y i , Y j ) = σ i j Cov(Y_i, Y_j) = \sigma_{ij} C o v ( Y i , Y j ) = σ ij and V a r [ Y i ] = σ i i = σ i 2 Var[Y_i]=\sigma_{ii} = \sigma_i^2 Va r [ Y i ] = σ ii = σ i 2 , then we define the matrix
Σ = ( σ i j ) \boldsymbol{\Sigma} = (\sigma_{ij}) Σ = ( σ ij )
containing the variances and covariances.
We call this matrix the covariance matrix of Y Y Y , typically denoted V a r [ Y ] = Σ Var[Y] = \boldsymbol{\Sigma} Va r [ Y ] = Σ or C o V a r [ Y ] = Σ CoVar[Y] = \boldsymbol{\Sigma} C o Va r [ Y ] = Σ .
Examples If Y i , … , Y n Y_i, \ldots, Y_n Y i , … , Y n are independent identically distributed, E Y i = μ EY_i = \mu E Y i = μ , V Y i = σ 2 VY_i = \sigma^2 V Y i = σ 2 , a , b ∈ R n a,b\in\mathbb{R}^n a , b ∈ R n and U = a ′ Y U=a'Y U = a ′ Y , W = b ′ Y W=b'Y W = b ′ Y , and
T = [ U W ] T = \begin{bmatrix} U \\ W \end{bmatrix} T = [ U W ] then
E T = [ Σ a i μ Σ b i μ ] ET = \begin{bmatrix} \Sigma a_i \mu \\ \Sigma b_i \mu \end{bmatrix} ET = [ Σ a i μ Σ b i μ ] V T = Σ = σ 2 [ Σ a i 2 Σ a i b i Σ a i b i Σ b i 2 ] VT = \boldsymbol{\Sigma} = \sigma^2 \begin{bmatrix} \Sigma a_i^2 & \Sigma a_i b_i \\ \Sigma a_ib_i & \Sigma b_i^2 \end{bmatrix} V T = Σ = σ 2 [ Σ a i 2 Σ a i b i Σ a i b i Σ b i 2 ] If Y ‾ \underline{Y} Y is a random vector with mean μ \boldsymbol{\mu} μ and variance-covariance matrix Σ \boldsymbol{\Sigma} Σ , then
E [ a ′ Y ] = a ′ μ E[a'Y] = a'\mu E [ a ′ Y ] = a ′ μ
and
V a r [ a ′ Y ] = a ′ Σ a Var[a'Y] = a' \boldsymbol{\Sigma} a Va r [ a ′ Y ] = a ′ Σ a
Suppose
Y = ( Y 1 ⋮ Y n ) \mathbf{Y} = \left( \begin{array}{c} Y_1 \\ \vdots \\ Y_n \end{array} \right) Y = ⎝ ⎛ Y 1 ⋮ Y n ⎠ ⎞ is a random vector with E [ Y ] = μ E[\mathbf{Y}] = \mu E [ Y ] = μ and V a r [ Y ] = Σ Var[\mathbf{Y}] = \boldsymbol{\Sigma} Va r [ Y ] = Σ where the variance-covariance matrix
Σ = σ 2 I \boldsymbol{\Sigma} = \sigma^2 I Σ = σ 2 I
Details Note that if Y 1 , … , Y n Y_1, \ldots, Y_n Y 1 , … , Y n are independent with common variance σ 2 \sigma^2 σ 2 then
Σ = [ σ 1 2 σ 12 σ 13 … σ 1 n σ 21 σ 2 2 σ 23 … σ 2 n σ 31 σ 32 σ 3 2 … σ 3 n ⋮ ⋮ ⋮ ⋱ ⋮ σ n 1 σ n 2 σ n 3 … σ n 2 ] = [ σ 1 2 0 … … 0 0 σ 2 2 ⋱ 0 ⋮ ⋮ ⋱ σ 3 2 ⋱ ⋮ ⋮ 0 ⋱ ⋱ 0 0 … … 0 σ n 2 ] = σ 2 [ 1 0 … … 0 0 1 ⋱ 0 ⋮ ⋮ ⋱ 1 ⋱ ⋮ ⋮ 0 ⋱ ⋱ 0 0 … … 0 1 ] = σ 2 I \boldsymbol{\Sigma} = \left[ \begin{array}{ccccc} \sigma_{1}^{2} & \sigma_{12} & \sigma_{13} & \ldots & \sigma_{1n} \\ \sigma_{21} & \sigma_2^{2} & \sigma_{23} & \ldots & \sigma_{2n} \\ \sigma_{31} &\sigma_{32} &\sigma_3^{2} & \ldots & \sigma_{3n} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \sigma_{n1} & \sigma_{n2} & \sigma_{n3} & \ldots & \sigma_n^{2} \end{array} \right] = \left[ \begin{array}{ccccc} \sigma_{1}^{2} & 0 & \ldots & \ldots & 0 \\ 0 & \sigma_2^{2} & \ddots & 0 & \vdots \\ \vdots & \ddots &\sigma_3^{2} & \ddots & \vdots \\ \vdots & 0 & \ddots & \ddots & 0 \\ 0 & \ldots & \ldots & 0 & \sigma_n^{2} \end{array} \right] = \sigma^2 \left[ \begin{array}{ccccc} 1 & 0 & \ldots & \ldots & 0 \\ 0 & 1 & \ddots & 0 & \vdots \\ \vdots & \ddots & 1 & \ddots & \vdots \\ \vdots & 0 & \ddots & \ddots & 0 \\ 0 & \ldots & \ldots & 0 & 1 \end{array} \right] = \sigma^2 I Σ = ⎣ ⎡ σ 1 2 σ 21 σ 31 ⋮ σ n 1 σ 12 σ 2 2 σ 32 ⋮ σ n 2 σ 13 σ 23 σ 3 2 ⋮ σ n 3 … … … ⋱ … σ 1 n σ 2 n σ 3 n ⋮ σ n 2 ⎦ ⎤ = ⎣ ⎡ σ 1 2 0 ⋮ ⋮ 0 0 σ 2 2 ⋱ 0 … … ⋱ σ 3 2 ⋱ … … 0 ⋱ ⋱ 0 0 ⋮ ⋮ 0 σ n 2 ⎦ ⎤ = σ 2 ⎣ ⎡ 1 0 ⋮ ⋮ 0 0 1 ⋱ 0 … … ⋱ 1 ⋱ … … 0 ⋱ ⋱ 0 0 ⋮ ⋮ 0 1 ⎦ ⎤ = σ 2 I If A A A is an m × n m \times n m × n matrix, then
E [ A Y ] = A μ E[A\mathbf{Y}] = A \mathbf{\mu} E [ A Y ] = A μ
and
V a r [ A Y ] = A Σ A ′ Var[A\mathbf{Y}] = A \boldsymbol{\Sigma} A' Va r [ A Y ] = A Σ A ′